Adding Fault-Tolerance to a Hierarchical DRE System
نویسندگان
چکیده
Dynamic resource management is a crucial part of the infrastructure for emerging mission-critical distributed real-time embedded system. Because of this, the resource manager must be fault-tolerant, with nearly continuous operation. This paper describes an ongoing effort to develop a fault-tolerant multi-layer dynamic resource management capability and the challenges we have encountered, including multi-tiered structure, rapid recovery, the characteristics of component middleware, and the co-existence of replicated and non-replicated elements. While some of these have been investigated before, this work exhibits all of these characteristics simultaneously, presenting a significant fault-tolerance research challenge.
منابع مشابه
Towards Middleware for Fault-Tolerance in Distributed Real-Time and Embedded Systems
Distributed real-time and embedded (DRE) systems often require support for multiple simultaneous quality of service (QoS) properties, such as real-timeliness and fault tolerance, that operate within resource constrained environments. These resource constraints motivate the need for a lightweight middleware infrastructure, while the need for simultaneous QoS properties require the middleware to ...
متن کاملDesigning Fault tolerant Mission-Critical Middleware Infrastructure for Distributed Real-time and Embedded Systems?
Fault tolerance is a crucial design consideration for missioncritical distributed real-time and embedded (DRE) systems, such as avionics mission computing systems, and supervisory control and data acquisition systems. Increasingly more of these systems are created using emerging middleware standards, such as publish-subscribe communication services and component based architectures. Most previo...
متن کاملMDDPro: Model-Driven Dependability Provisioning in Enterprise Distributed Real-Time and Embedded Systems
Service oriented architecture (SOA) design principles are increasingly being adopted to develop distributed real-time and embedded (DRE) systems, such as avionics mission computing, due to the availability of real-time component middleware platforms. Traditional approaches to fault tolerance that rely on replication and recovery of a single server or a single host do not work in this paradigm s...
متن کاملModel - Driven Fault - Tolerance Provisioning for Component - Based Distributed Real - Time Embedded Systems
Developing distributed real-time and embedded (DRE) systems require effective strategies to simultaneously handle the challenges of networked systems, enterprise systems, and embedded systems. Component-based model is gaining prominence for the development of DRE systems because of its emphasis on composability, reuse, excellent support for separation of concerns, and explicit staging of develo...
متن کاملImproving the palbimm scheduling algorithm for fault tolerance in cloud computing
Cloud computing is the latest technology that involves distributed computation over the Internet. It meets the needs of users through sharing resources and using virtual technology. The workflow user applications refer to a set of tasks to be processed within the cloud environment. Scheduling algorithms have a lot to do with the efficiency of cloud computing environments through selection of su...
متن کامل